Convolutional neural network (CNN) offers significant accuracy in imagedetection. To implement image detection using CNN in the internet of things(IoT) devices, a streaming hardware accelerator is proposed. The proposedaccelerator optimizes the energy efficiency by avoiding unnecessary datamovement. With unique filter decomposition technique, the accelerator cansupport arbitrary convolution window size. In addition, max pooling functioncan be computed in parallel with convolution by using separate pooling unit,thus achieving throughput improvement. A prototype accelerator was implementedin TSMC 65nm technology with a core size of 5mm2. The accelerator can supportmajor CNNs and achieve 152GOPS peak throughput and 434GOPS/W energy efficiencyat 350mW, making it a promising hardware accelerator for intelligent IoTdevices.
展开▼